智能论文笔记

Can segmentation models be trained with fully synthetically generated data?

Virginia Fernandez , Walter Hugo Lopez Pinaya , Pedro Borges , Petru-Daniel Tudosiu , Mark S Graham , Tom Vercauteren , M Jorge Cardoso

分类：计算机视觉

2022-09-17

为了实现良好的性能和概括性，医疗图像分割模型应在具有足够可变性的大量数据集上进行培训。由于道德和治理限制以及与标签数据相关的成本，经常对科学发展进行扼杀，并经过对有限数据的培训和测试。数据增强通常用于人为地增加数据分布的可变性并提高模型的通用性。最近的作品探索了图像合成的深层生成模型，因为这种方法将使有效的无限数据生成多种多样的数据，从而解决了通用性和数据访问问题。但是，许多提出的解决方案限制了用户对生成内容的控制。在这项工作中，我们提出了Brainspade，该模型将基于合成扩散的标签发生器与语义图像发生器结合在一起。我们的模型可以在有或没有感兴趣的病理的情况下产生完全合成的大脑标签，然后产生任意引导样式的相应MRI图像。实验表明，Brainspade合成数据可用于训练分割模型，其性能与在真实数据中训练的模型相当。

translated by 谷歌翻译

Brain Imaging Generation with Latent Diffusion Models

Walter H. L. Pinaya , Petru-Daniel Tudosiu , Jessica Dafflon , Pedro F da Costa , Virginia Fernandez , Parashkev Nachev , Sebastien Ourselin , M. Jorge Cardoso

分类：计算机视觉

2022-09-15

深度神经网络在医学图像分析中带来了显着突破。但是，由于其渴望数据的性质，医学成像项目中适度的数据集大小可能会阻碍其全部潜力。生成合成数据提供了一种有希望的替代方案，可以补充培训数据集并进行更大范围的医学图像研究。最近，扩散模型通过产生逼真的合成图像引起了计算机视觉社区的注意。在这项研究中，我们使用潜在扩散模型探索从高分辨率3D脑图像中生成合成图像。我们使用来自英国生物银行数据集的T1W MRI图像（n = 31,740）来训练我们的模型，以了解脑图像的概率分布，该脑图像以协变量为基础，例如年龄，性别和大脑结构量。我们发现我们的模型创建了现实的数据，并且可以使用条件变量有效地控制数据生成。除此之外，我们创建了一个带有100,000次脑图像的合成数据集，并使科学界公开使用。

translated by 谷歌翻译

Morphology-preserving Autoregressive 3D Generative Modelling of the Brain

Petru-Daniel Tudosiu , Walter Hugo Lopez Pinaya , Mark S. Graham , Pedro Borges , Virginia Fernandez , Dai Yang , Jeremy Appleyard , Guido Novati , Disha Mehra , Mike Vella

分类：计算机视觉 | 机器学习

2022-09-07

可以使用医学成像数据研究人类解剖学，形态和相关疾病。但是，访问医学成像数据受到治理和隐私问题，数据所有权和获取成本的限制，从而限制了我们理解人体的能力。解决此问题的一个可能解决方案是创建能够学习的模型，然后生成以相关性的特定特征（例如，年龄，性别和疾病状态）来生成人体的合成图像。最近，以神经网络形式的深层生成模型已被用于创建自然场景的合成2D图像。尽管如此，数据稀缺性，算法和计算局限性仍阻碍了具有正确解剖形态的高分辨率3D体积成像数据的能力。这项工作提出了一个生成模型，可以缩放以产生人类大脑的解剖学正确，高分辨率和现实的图像，并具有必要的质量，以允许进一步的下游分析。产生潜在无限数据的能力不仅能够对人体解剖学和病理学进行大规模研究，而不会危及患者的隐私，而且还可以在异常检测，模态综合，有限的数据和公平和公平和公平和公平和公平和公平和公平和公平和公平和公平和公平和公平和公平的学习领域进行显着提高。道德AI。代码和训练有素的模型可在以下网址提供：https：//github.com/amigolab/synthanatomy。

translated by 谷歌翻译

The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

Francisco Valentini , Germán Rosati , Diego Fernandez Slezak , Edgar Altszyler

分类：自然语言处理 | 人工智能

2023-01-02

Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.

translated by 谷歌翻译

On Noisy Evaluation in Federated Hyperparameter Tuning

Kevin Kuo , Pratiksha Thaker , Mikhail Khodak , John Ngyuen , Daniel Jiang , Ameet Talwalkar , Virginia Smith

分类：机器学习

2022-12-17

Hyperparameter tuning is critical to the success of federated learning applications. Unfortunately, appropriately selecting hyperparameters is challenging in federated networks. Issues of scale, privacy, and heterogeneity introduce noise in the tuning process and make it difficult to evaluate the performance of various hyperparameters. In this work, we perform the first systematic study on the effect of noisy evaluation in federated hyperparameter tuning. We first identify and rigorously explore key sources of noise, including client subsampling, data and systems heterogeneity, and data privacy. Surprisingly, our results indicate that even small amounts of noise can significantly impact tuning methods-reducing the performance of state-of-the-art approaches to that of naive baselines. To address noisy evaluation in such scenarios, we propose a simple and effective approach that leverages public proxy data to boost the evaluation signal. Our work establishes general challenges, baselines, and best practices for future work in federated hyperparameter tuning.

translated by 谷歌翻译

Reinforcement Learning in System Identification

Jose Antonio Martin H. , Oscar Fernandez Vicente , Sergio Perez , Anas Belfadil , Cristina Ibanez-Llano , Freddy Jose Perozo Rondon , Jose Javier Valle , Javier Arechalde Pelaz

分类：机器学习 | 人工智能

2022-12-14

System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a mapping function from current state and action to the next state. This problem is commonly defined as a Supervised Learning problem in a direct way. This common approach faces several difficulties due to the inherent complexities of the dynamics to learn, for example, delayed effects, high non-linearity, non-stationarity, partial observability and, more important, error accumulation when using bootstrapped predictions (predictions based on past predictions), over large time horizons. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems.

translated by 谷歌翻译

Distributed Bayesian Learning of Dynamic States

Mert Kayaalp , Virginia Bordignon , Stefan Vlaski , Vincenzo Matta , Ali H. Sayed

分类：机器学习

2022-12-05

This work studies networked agents cooperating to track a dynamical state of nature under partial information. The proposed algorithm is a distributed Bayesian filtering algorithm for finite-state hidden Markov models (HMMs). It can be used for sequential state estimation tasks, as well as for modeling opinion formation over social networks under dynamic environments. We show that the disagreement with the optimal centralized solution is asymptotically bounded for the class of geometrically ergodic state transition models, which includes rapidly changing models. We also derive recursions for calculating the probability of error and establish convergence under Gaussian observation models. Simulations are provided to illustrate the theory and to compare against alternative approaches.

translated by 谷歌翻译

Differentially Private Adaptive Optimization with Delayed Preconditioners

Tian Li , Manzil Zaheer , Ken Ziyu Liu , Sashank J. Reddi , H. Brendan McMahan , Virginia Smith

分类：机器学习

2022-12-01

Privacy noise may negate the benefits of using adaptive optimizers in differentially private model training. Prior works typically address this issue by using auxiliary information (e.g., public data) to boost the effectiveness of adaptive optimization. In this work, we explore techniques to estimate and efficiently adapt to gradient geometry in private adaptive optimization without auxiliary data. Motivated by the observation that adaptive methods can tolerate stale preconditioners, we propose differentially private adaptive training with delayed preconditioners (DP^2), a simple method that constructs delayed but less noisy preconditioners to better realize the benefits of adaptivity. Theoretically, we provide convergence guarantees for our method for both convex and non-convex problems, and analyze trade-offs between delay and privacy noise reduction. Empirically, we explore DP^2 across several real-world datasets, demonstrating that it can improve convergence speed by as much as 4x relative to non-adaptive baselines and match the performance of state-of-the-art optimization methods that require auxiliary data.

translated by 谷歌翻译

Impact of Automatic Image Classification and Blind Deconvolution in Improving Text Detection Performance of the CRAFT Algorithm

Clarisa V. Albarillo , Proceso L. Fernandez Jr

分类：计算机视觉 | 机器学习

2022-11-29

Text detection in natural scenes has been a significant and active research subject in computer vision and document analysis because of its wide range of applications as evidenced by the emergence of the Robust Reading Competition. One of the algorithms which has good text detection performance in the said competition is the Character Region Awareness for Text Detection (CRAFT). Employing the ICDAR 2013 dataset, this study investigates the impact of automatic image classification and blind deconvolution as image pre-processing steps to further enhance the text detection performance of CRAFT. The proposed technique automatically classifies the scene images into two categories, blurry and non-blurry, by utilizing of a Laplacian operator with 100 as threshold. Prior to applying the CRAFT algorithm, images that are categorized as blurry are further pre-processed using blind deconvolution to reduce the blur. The results revealed that the proposed method significantly enhanced the detection performance of CRAFT, as demonstrated by its IoU h-mean of 94.47% compared to the original 91.42% h-mean of CRAFT and this even outperformed the top-ranked SenseTime, whose h-mean is 93.62%.

translated by 谷歌翻译

A Temporal Anomaly Detection System for Vehicles utilizing Functional Working Groups and Sensor Channels

Subash Neupane , Ivan A. Fernandez , Wilson Patterson , Sudip Mittal , Shahram Rahimi

分类：机器学习 | 人工智能 | 神经与进化计算

2022-09-14

装有传感器，执行器和电子控制单元（ECU）的现代车辆可以分为几个称为功能工作组（FWGS）的操作子系统。这些FWG的示例包括发动机系统，变速箱，燃油系统，制动器等。每个FWG都有相关的传感器通道，可以衡量车辆操作条件。这种丰富的数据环境有利于预测维护（PDM）技术的开发。削弱各种PDM技术的是需要强大的异常检测模型，该模型可以识别出明显偏离大多数数据的事件或观察结果，并且不符合正常车辆操作行为的明确定义的概念。在本文中，我们介绍了车辆性能，可靠性和操作（VEPRO）数据集，并使用它来创建一种基于多阶段的异常检测方法。利用时间卷积网络（TCN），我们的异常检测系统可以达到96％的检测准确性，并准确预测91％的真实异常。当利用来自多个FWG的传感器通道时，我们的异常检测系统的性能会改善。

translated by 谷歌翻译